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FLOATING POINT MULTIPLY ACCUMULATOR 

Field 

5 The present invention relates generally to floating point operations, and more 

specifically to floating point multiply accumulators. 

Background 

Fast floating point mathematical operations have become an important feature 

10 in modern electronics. Floating point units are useful in applications such as three- 
dimensional graphics computations and digital signal processing (DSP). Examples 
of three-dimensional graphics computation include geometry transformations and 
perspective transformations. These transformations are performed when the motion 
of objects is determined by calculating physical equations in response to interactive 

1 5 events instead of replaying prerecorded data. 

Many DSP operations, such as finite impulse response (FIR) filters, compute 
ECa,- fy), where i = 0 to n-1, and % and b ; are both single precision floating point 
numbers. This type of computation typically employs floating point multiply 
accumulate (FMAC) units which perform many multiplication operations and add 

20 the resulting products to give the final result. In these types of applications, fast 
FMAC units typically execute multiplies and additions in parallel without pipeline 
bubbles. One example FMAC unit is described in: Nobuhiro et al., "2.44-GFLOPS 
300-MHz Floating-Point Vector Processing Unit for High-Performance 3-D 
Graphics Computing," IEEE Journal of Solid State Circuits, Vol. 35, No. 7, July 

25 2000. 

The Institute of Electrical and Electronic Engineers (IEEE) has published an 
industry standard for floating point operations in the ANSI/IEEE Std 754-1985, 
IEEE Standard for Binary Floating-Point Arithmetic, IEEE, New York, 1985, 
hereinafter referred to as the "IEEE standard." A typical implementation for a 
30 floating point FMAC compliant with the IEEE standard is shown in Figure 1 . 
FMAC 100 implements a single precision floating point multiply and accumulate 
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instruction "D=(AxB)+C," as an indivisible operation. As can be seen from Figure 
1, fast floating point multipliers and fast floating point adders are both important 
ingredients to make a fast FMAC. 

Multiplicands A and B are received by multiplier 1 10, and the product is 
5 normalized in post-normalization block 120. Multiplicands A and B are typically in 
an IEEE standard floating point format, and post-normalization block 120 typically 
operates on (normalizes) the output of multiplier 1 10 to make the product conform to 
the same format. For example, when multiplicands A and B are IEEE standard 
single precision floating point numbers, post-normalization block 120 operates on 
1 0 the output from multiplier 1 1 0 so that adder 1 30 receives the product as an IEEE 
standard single precision floating point number. 

Adder 130 adds the normalized product from post-normalization block 120 
with the output from multiplexer 140. Multiplexer 140 can choose between the 
number C and the previous sum on node 1 52. When the previous sum is used, 
1 5 FMAC 1 00 is performing a multiply-accumulate function. The output of adder 1 30 
is normalized in post-normalization block 150 so that the sum on node 152 is in the 
standard format discussed above. 

Adder 130 and post-normalization block 150 can be "non-pipelined," which 
means that an accumulation can be performed in a single clock cycle. When non- 
20 pipelined, adder 1 30 and post-normalization block typically include sufficient logic 
to limit the frequency at which FMAC 100 can operate, in part because floating point 
adders typically include circuits for alignment, mantissa addition, rounding, and 
other complex operations. To increase the frequency of operation, adder 130 and 
post-normalization block 150 can be "pipelined," which means registers can be 
25 included in the data path to store intermediate results. One disadvantage of 
pipelining is the introduction of pipeline stalls or bubbles, which decrease the 
effective data rate through FMAC 100. 

For the reasons stated above, and for other reasons stated below which will 
become apparent to those skilled in the art upon reading and understanding the 
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present specification, there is a need in the art for fast floating point multiply and 
accumulate circuits. 

Brief Description of the Drawings 

5 Figure 1 shows a prior art floating point multiply-accumulate circuit; 

Figure 2 shows an integrated circuit with a floating point multiply- 
accumulate circuit; 

Figure 3 shows the exponent and mantissa paths of a floating point multiply- 
accumulate circuit; 
1 0 Figure 4 shows a mantissa multiplier circuit; 

Figure 5 shows a floating point conversion unit; 

Figure 6 shows a carry-save negation circuit; 

Figure 7 shows a base 32 floating point number representation; 

Figure 8 shows an exponent path of a floating point adder; 
1 5 Figure 9 shows a mantissa path of a floating point adder; 

Figure 10 shows an overflow detection circuit; 

Figure 1 1 shows a post-normalization circuit; and 

Figure 12 shows a sign detection circuit. 

20 Description of Embodiments 

In the following detailed description of the embodiments, reference is made 
to the accompanying drawings which show, by way of illustration, specific 
embodiments in which the invention may be practiced. In the drawings, like 
numerals describe substantially similar components throughout the several views. 

25 These embodiments are described in sufficient detail to enable those skilled in the art 
to practice the invention. Other embodiments may be utilized and structural, logical, 
and electrical changes may be made without departing from the scope of the present 
invention. Moreover, it is to be understood that the various embodiments of the 
invention, although different, are not necessarily mutually exclusive. For example, a 

30 particular feature, structure, or characteristic described in one embodiment may be 
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included within other embodiments. The following detailed description is, therefore, 
not to be taken in a limiting sense, and the scope of the present invention is defined 
only by the appended claims, along with the full scope of equivalents to which such 
claims are entitled. 

5 

Floating Point Multiply Accumulator 
Figure 2 shows an integrated circuit with a floating point multiply- 
accumulate circuit. Integrated circuit 200 includes floating point multiplier 210, 
floating point conversion unit 220, floating point adder 230, and post-normalization 
1 0 circuit 250. Each of the elements shown in Figure 2 is explained in further detail 
with reference to figures that follow. In this section, a brief overview of the Figure 2 
elements and their operation is given to provide a context for more detailed 
explanations that follow. 

Each node in Figure 2 is shown as a single line for clarity. Most of these 
1 5 nodes include many physical connections, or "traces," within integrated circuit 200. 
For example, floating point numbers generally include sign bits, exponent fields, and 
mantissa fields. Therefore, nodes that hold floating point numbers, such as nodes 
202 and 204, include many physical connections within integrated circuit 200. This 
convention is used throughout this description, and nodes shown as single lines are 
20 not necessarily intended to represent a single physical connection. 

Floating point multiplier 210 receives two floating point operands, operand A 
on node 202, and operand B on node 204, and produces a floating point product on 
node 212. The floating point product on node 212 is converted to a different floating 
point representation by floating point conversion unit 220. Node 222 holds the 
25 converted product generated by floating point conversion unit 220. This is in 

contrast to the prior art implementation shown in Figure 1 . In the implementation of 
Figure 1, as described above, the output of the multiplier is post-normalized to 
represent the product in the same format as the operands. In the embodiment of 
Figure 2, in contrast, the output of floating point multiplier 210 is not post- 
30 normalized. Instead, it is converted to a different floating point format. 
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Floating point adder 230 receives the converted product on node 222, and 
also receives a previous sum on node 232. Floating point adder then produces a 
present sum on node 232. It should be noted that the output of floating point adder 
230 is not post-normalized prior to being fed back for accumulation. The lack of a 
5 post-normalization circuit in the feedback path provides for a faster FMAC. Post- 
normalization circuit 250 receives the sum on node 232 and produces a result on 
node 252. Again, it should be noted that the post-normalization operation is reserved 
for the end of the multiply-accumulate circuit rather than immediately after both the 
multiplier and the adder. 

1 0 In the embodiments represented by Figure 2, post-normalization circuit 250 

receives an enable signal on node 254. The enable signal allows the post- 
normalization circuitry to be turned off while the majority of the multiplications and 
accumulations are performed, and then turned on at the end of the operation when the 
result is generated. In this manner, post-normalization circuit 250 can be turned off 

15 for a majority of the time, thereby saving power. 

Integrated circuit 200 can be any type of integrated circuit capable of 
including a multiply accumulate circuit. For example, integrated circuit 200 can be a 
processor such as a microprocessor, a digital signal processor, a micro controller, or 
the like. Integrated circuit 200 can also be an integrated circuit other than a 

20 processor such as an application-specific integrated circuit (ASIC), a 
communications device or a memory controller. 

In general, floating-point numbers are represented as a concatenation of a 
sign bit, an exponent field, and a significant field (also referred to as the mantissa). 
In the IEEE single precision floating-point format, the most significant bit (integer 

25 bit) of the mantissa is not represented. The most significant bit of the mantissa has 
an assumed value of 1, except for denormal numbers, whose most significant bit of 
the mantissa is 0. A single precision floating point number as specified by the IEEE 
standard has a 23 bit mantissa field, an eight bit exponent field, and a one bit sign 
field. The remainder of this description is arranged to describe multiply-accumulate 

30 operations on IEEE single precision floating point numbers, but this is not a 
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limitation of the invention. IEEE compliant numbers have been chosen for 
illustration of the present invention because of their wide-spread use, but one skilled 
in the art will understand that any other floating point format can be utilized without 
departing from the scope of the invention. 
5 Figure 3 shows the exponent and mantissa paths of a floating point multiply- 

accumulate circuit. The various elements of Figure 2 are shown again in Figure 3, 
but with slightly more detail. In particular, each element, where appropriate, is 
shown broken down into an exponent path and a mantissa path. Operations 
involving the sign bits of the floating point numbers are not shown in Figure 3. 

10 Instead, all operations involving sign bits are presented in detail in later figures. For 
all floating point numbers referred to in this description, all sign bits, exponent fields, 
and mantissa fields are labeled with a capital S, E, and M, respectively, with an 
identifying subscript. For example, floating point number A includes sign bit S a , 
exponent field E a , and mantissa field M a and floating point number B includes sign 

1 5 bit S b , exponent field E^, and mantissa field M b . 

Floating point multiplier 210 includes exponent path 302 and mantissa path 
304. Floating point multiplier 210 also includes an exclusive-or gate (not shown) to 
generate the sign of the product, S p , from the signs of the operands, S a and S b , as is 
well known in the art. Exponent path 302 includes an exponent summer that receives 

20 exponents E a and ^ on nodes 301 and 303 respectively, and sums them with 

negative 127 to produce the exponent of the product, E p , on node 308. E a and E b are 
each eight bit numbers, as is E p . Negative 127 is summed with the exponent fields 
because the IEEE single precision floating point format utilizes biased exponents. 
Exponent path 302 can be implemented using standard adder architectures as are well 

25 known in the art. 

Mantissa path 304 receives mantissas M a and M b on nodes 305 and 307, 
respectively. Mantissa path 304 includes a mantissa multiplier that multiplies 
mantissas M a and M b , and produces the mantissa of the product, Mp, on node 306. 
Mantissas M a and M b are each 23 bits in accordance with the IEEE standard, and 
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mantissa Mp is 24 bits in carry-save format. Mantissa path 304 and carry-save 
format are described in more detail with reference to Figure 4 below. 

The exponent of the product, E p , is an eight bit number with a least 
significant bit weight equal to one. For example, an E p field of 0000001 1 has a value 
5 of three, because the least significant bit has a weight of one, and the next more 
significant bit has a weight of two. For the purposes of this description, this 
exponent format is termed "base 2," and the product is said to be in base 2. Floating 
point conversion unit 220 converts the product from base 2 to a different base. For 
example, exponent path 312 is an exponent conversion unit that sets the least 

10 significant five bits of the exponent field to zero, and truncates the exponent field to 
three bits, leaving the least significant bit of the exponent of the converted product, 
E cp , with a weight of 32. For example, an E cp field of 01 1 has a value of 96, because 
the least significant bit has a weight of 32, and the next more significant bit has a 
weight of 64. For the purposes of this description, this exponent format is termed 

1 5 "base 32," and the converted product is said to be in base 32. 

Mantissa path 3 14 of floating point conversion unit 220 shifts the mantissa of 
the product, Mp, to the left by the number of bit positions equal to the value of the 
least significant five bits of the exponent of the product, E p . Mantissa path 314 
presents a 57 bit mantissa in carry-save format on node 316. Floating point 

20 conversion unit 220 does not operate on the sign bit, so the sign of the converted 
product, S^, is the same as the sign of the product, S p . One embodiment of floating 
point conversion unit 220 is shown in more detail in Figure 5. 

Floating point adder 230 includes adder exponent path 322, adder mantissa 
path 324, and magnitude comparator 325. Exponent path 322 includes an exponent 

25 accumulation stage that receives the converted product exponent, E^, on node 3 1 8, 
and the feedback exponent, E^, on node 328, and produces the sum exponent E sum on 
node 328. The sum is a base 32 number in carry-save format. Exponent path 322 
also produces control signals on node 323. Node 323 carries information from 
exponent path 322 to mantissa path 324 to signify whether the two exponents are 

30 equal (E = E ft ), whether one exponent is greater than the other (E > E & , E < E^), 
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and whether one exponent is one greater than the other (E^ = + 1, E^ = E cp + 1). 
Because the converted product and the sum are floating point numbers in base 32 
format, an exponent that differs by a least significant bit differs by a "weight" of 
thirty-two. Exponent path 322 also receives an overflow signal from mantissa path 
5 324 on node 323. 

Mantissa path 324 includes a mantissa accumulator that receives mantissa 
fields and on nodes 316 and 326, respectively, and produces mantissa field 
M sum on node 326. Mantissa path 324 also receives control signals on node 323 from 
exponent path 322, and produces the overflow signal and sends it to exponent path 

10 322. Embodiments of adder exponent path 322 and adder mantissa path 324 and the 
signals therebetween are described in more detail with reference to Figures 8 and 9, 
below. Magnitude comparator 325 receives mantissa fields M^ and M & on nodes 
316 and 326, respectively, and produces a magnitude compare (MC) result on node 
327. MC is used by post-normalization circuit 250 to aid in the determination of the 

15 sign of the result, as is further explained below with reference to Figures 1 1 and 12. 

Post-normalization circuit 250 receives the base 32 carry-save format sum 
from floating point adder 230, and converts it to an IEEE single precision floating 
point number. One embodiment of post-normalization circuit 250 is described in 
more detail with reference to Figure 11, below. 

20 

Multiplier 

As previously described, multiplier 210 includes an exclusive-or function for 
sign bit generation, an exponent path for generating the exponent of the product, and 
a mantissa path to generate a mantissa of the product in carry-save format. Figure 4 

25 shows an embodiment of multiplier mantissa path 304. Mantissa path 304 includes a 
plurality of compressor trees 41 0. Each of compressor trees 410 receives a part of 
mantissa M a on node 305 and a part of a mantissa M b on node 307, and produces 
carry and sum signals to form mantissa Mp on node 306 in carry-save format. Carry- 
save format is a redundant format wherein each bit within the number is represented 

30 by two physical bits, a sum bit and a carry bit. Therefore, a 24 bit number in carry- 
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save format is represented by 48 physical bits: 24 bits of sum, and 24 bits of carry. 
Each of compressor trees 410 generates a single sum bit and a single carry bit. 
Embodiments that produce a 24 bit carry-save number include 24 compressor trees 
410. 

5 Prior art multipliers that utilize compressor trees typically include a carry 

propagate adder (CPA) after the compressors to convert the carry-save format 
product into a binary product. See, for example, G. Goto, T. Sato, M. Nakajima, & 
T. Sukemura, "A 54 x 54 Regularly Structured Tree Multiplier " IEEE Journal of 
Solid State Circuits, p. 1229, Vol. 27, No. 9, Sept., 1992. The various embodiments 

10 of the method and apparatus of the present invention do not include a CPA after the 
compressors, but instead utilize the product directly in carry-save format. 

Each compressor tree 410 receives carry signals from a previous stage, and 
produces carry signals for the next stage. For example, the least significant 
compressor tree receives zeros on node 420 as carry in signals, and produces carry 

15 signals on node 422 for the next significant stage. The most significant compressor 
tree receives carry signals from the previous stage on node 424. 

Each compressor tree 410 includes a plurality of 3-2 compressors and/or 4-2 
compressors arranged to sum partial products generated by partial product 
generators. For a discussion of compressors, see Neil H. E. Weste & Kamran 

20 Eshragihan, "Principles of CMOS VLSI Design: A Systems Perspective," 2 nd Ed., 
pp. 554-558 (Addison Wesley Publishing 1994). 

Floating Point Conversion Unit 
Figure 5 shows a floating point conversion unit. Floating point conversion 
25 unit 220 receives eight bit exponent field of the product, E p [7:0], where E p [7] is the 
most significant bit, and E p [0] is the least significant bit. The exponent of the 
converted product, E^, is created by removing the least significant five bits from the 
exponent field. E cp has a least significant bit equal to E p [5], which has a weight of 
thirty-two. 
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Shifter 520 receives the 24 bit product mantissa, Mp, in carry-save format, 
and shifts both the sum field and the carry field left by an amount equal to the value 
of the least significant five bits of the product exponent, E p [4:0]. If the product is 
negative, multiplexer 540 selects a negated mantissa that is negated by negation 
5 circuit 530. is a 57 bit number in carry-save format, and is a three bit 
exponent. 

Figure 6 shows a carry-save negation circuit. Carry-save negation circuit 530 
negates a number in carry-save format. Both the sum and carry signals are inverted, 
and combined with a constant of two using a three-to-two compressor. Carry-save 
10 negation circuit 530 negates a 57 bit carry-save number. An example using a six bit 
carry-save number is now presented to demonstrate the operation of three-to-two 
compressors to negate a carry-save number. A six bit carry-save number with a 
value of six is represented as follows: 

15 000010 <-sum 

000100 <-carry 

When both the sum and carry bits above are summed, the result is 0001 10, 
which equals six. The carry-save negation circuit inverts the sum and carry signals 
20 and adds two as follows: 

111101 <- inverted sum 
111011 <- inverted carry 

000010 <- constant of two 

25 

0001 00 <- resulting sum 

111011 <- resulting carry 

Figure 7 shows base 2 and base 32 floating point number representations. 
30 Base 2 floating point number representation 710 is the representation produced by 
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floating point multiplier 210 (Figures 2 and 3), and base 32 floating point number 
representation 720 is the representation produced by floating point conversion unit 
220 (Figures 2 and 3). Base 2 floating point number representation 710 includes sign 
bit 712, eight bit exponent field 714, and twenty-four bit mantissa field 716. Base 2 
5 floating point number representation 71 0 is in the IEEE standard single precision 
format with an explicit integer bit added to increase the mantissa from twenty-three 
bits to twenty-four bits. Base 32 floating point number 720 includes a sign bit 722, a 
three bit exponent field 724, and a fifty-seven bit mantissa field 726. Floating point 
conversion unit 220 (Figure 6) converts floating point numbers in representation 710 

10 to floating point numbers in representation 720. 

Exponent 724 is equal to the most significant three bits of exponent 714. The 
least significant bit of exponent 724 has a "weight" of thirty-two. In other words, a 
least significant change in exponent 724 corresponds to a mantissa shift of thirty-two 
bits. For this reason, floating point representation 720 is referred to as a "base 32" 

1 5 floating point representation. 



Floating Point Adder 
Figure 8 shows an exponent path of a floating point adder. Exponent path 
322 includes multiplexors 802, 804, and 806, comparator 820, incrementers 812 and 
20 814, and logic 810. Incrementers 812 and 814 pre-increment E ft and E^ to produce 
an incremented E^ and an incremented E^, respectively. When either exponent E^ 
or E^ is incremented, the value of the exponent is changed by thirty-two with respect 
to the mantissa. Accordingly, incrementers 812 and 814 are shown in Figure 5 with 
the label "+32." 

25 In operation, comparator 820 compares exponents E ft and E^, and generates 

logic outputs as shown in Figure 8. When E^ is greater than E cp , the (E & > E^) 
signal controls multiplexors 802 and 804 to select E^ and the incremented E^, 
respectively. Otherwise, multiplexors 802 and 804 select E^ and the incremented 
E cp , respectively. Multiplexor 806 selects either the exponent on node 805 or the 

30 incremented exponent on node 807 based on the overflow trigger (OFT) signal on 
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1 



node 811. OFT is asserted only if the OVF signal is asserted and the two three-bit 
input exponents are either equal or differ by one. A difference of one between the 
exponents is equal to a difference of thirty-two in a base 2 representation. Logic 810 
receives OVF from the mantissa path and logic outputs from comparator 820, and 
5 produces the OFT signal according to the following equation: 

OFT = OVF AND ((E ft = E^) OR (E^ = E cp + 1) OR (E^ = E ft + 1)). 

When OFT is true, the exponent of the sum, E sum , is chosen as the 

10 incremented exponent on node 807, and when OFT is false, E sum is chosen as the 
greater exponent on node 805. 

Comparator 820 compares three bit exponents and produces a plurality of 
outputs that are logic functions of the inputs. Each logic output is a function six 
input bits: three bits from E^, and three bits from E^. This provides a very quick 

1 5 logic path. In addition to the quick comparison made in the exponent path, the 

mantissa path includes constant shifters that conditionally shift mantissas by a fixed 
amount. The combination of a quick exponent comparison in the exponent path and 
a quick shift in the mantissa path provide for a fast floating point adder circuit. The 
constant shifter is described in more detail below with reference to Figure 9. 

20 Figure 9 shows a mantissa path of a floating point adder. Mantissa path 324 

includes constant shifters 902, 904, and 906, adder circuit 910, multiplexors 912 and 
914, and logic 916. Constant shifters 902, 904, and 906 can be used in place of 
variable shifters because a change in the least significant bit of the exponent is equal 
to a shift of thirty-two. This simplification saves on the amount of hardware 

25 necessary to implement the adder, and also decreases execution time. In some 
embodiments, constant shifters 902, 904, and 906 are implemented as a series of 
two-input multiplexors. 

Mantissa path 324 receives mantissa M& and mantissa M cp . In operation, 
constant shifter 904 shifts M cp thirty-two bit positions to the right when E^ is greater 

30 than E^, and constant shifter 902 shifts M^ thirty-two bit positions to the right when 
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E cp is greater than E ft . When is equal to E cp , then neither mantissa is shifted in 
mantissa path 324. After constant shifters 902 and 904, mantissa path 324 separates 
into two subpaths: the adder path and the bypass path. The adder path includes adder 
910 and constant shifter 906, while the bypass path includes multiplexor 912. 
5 Adder circuit 910 compresses the two mantissas in carry-save format on 

nodes 920 and 922 and produces the result in carry-save format on node 924. In 
some embodiments, adder circuit 910 includes four-to-two compressors to compress 
the two input mantissas into the result on node 924. If an overflow occurs in adder 
circuit 910, the OW signal is asserted and constant shifter 906 shifts the mantissa 

10 produced by adder circuit 910 thirty-two bit positions to the right. The OVF signal is 
sent to exponent path 322 to conditionally select an incremented exponent, as 
described above with reference to Figure 8. In some embodiments, adder circuit 910 
can be powered down when not in use. For example, when M sum is chosen from the 
bypass path rather than the adder path, adder circuit 910 can be shut down to save 

15 power. In the embodiment of Figure 9, adder circuit 910 can be powered down by 
asserting the PWRDN signal on node 950. 

Multiplexor 912, like adder circuit 910, receives exponents on nodes 920 and 
922. Unlike adder circuit 910, however, multiplexor 912 selects one of the inputs 
rather than adding them. Multiplexor 912 selects the mantissa that corresponds to 

20 the larger floating point number. For example, when E^ is greater than E^, 
multiplexor 912 selects E fc . Also for example, when E^ is greater than E fo , 
multiplexor 912 selects E^. Multiplexor 912 drives node 913 with the selected 
exponent. 

Multiplexor 914 selects the mantissa of the sum, M sum , from the adder path 
25 when the input exponents are equal or differ by one, and selects M sum from the 
bypass path when the input exponents differ by more than one. When the input 
exponents differ by more than one, a shift of sixty-four or more would be needed to 
align the mantissas for addition, and the mantissas in the embodiment of Figure 9 are 
fifty-seven bits long. The output of mantissa path 324 is a fifty-seven bit number in 
30 carry-save format. 
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Figure 10 shows an overflow detection circuit. Overflow detection circuit 
1000 includes an exclusive-or gate to generate the OVF overflow signal when the 
output of the adder has overflowed. Overflow detection circuit 1000 resides in adder 
circuit 910 (Figure 9), and generates the OVF signal that is sent to exponent path 322 
5 (Figure 8). Overflow detection circuit 1000 receives the most significant two bits of 
the sum in the carry-save format, and produces the overflow signal as the exclusive- 
or of these two bits. The six bit carry-save numbers from the previous negation 
example are now applied as examples in the context of overflow detection circuit 
1000. Below these two examples, a more complex example is given. 
10 A positive six is shown below as a carry-save number having sum and carry 

components. Each number includes two sign bits broken out from the rest of the 
number for clarity. The leftmost sign bit of the sum is SI, and the adjacent bit to the 
right is SO. Likewise, the leftmost sign bit of the carry is CI, and the adjacent bit to 
the right is CO. In this example, SI and SO are both zero, and there is no overflow. 

15 

00 0010 <-sum 
00 0100 <-carry 

A negative six is shown below as a carry-save number having sum and carry 
20 components. As in the previous example, the two sign bits of the sum and carry are 
broken out from the rest of the number for clarity. In this example, SI and SO are 
both zero, and there is no overflow. It should be noted that CI and CO are both one, 
but that CI and CO are not used as inputs to overflow detection circuit 1000, and so 
are irrelevant to the overflow determination. 

25 

00 0100 <- sum 
11 1011 <- carry 

In the previous two examples, no overflow existed. Another example is now 
30 provided that represents an overflow condition. Suppose that two numbers, "A" and 
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10 



15 



20 



"B," represented in carry-save format, are summed by adder circuit 910. "A" is 
equal to twenty two, and "B" is equal to sixteen. The two numbers are shown below 
as four bit carry-save numbers, each having two sign bits shown separated to the left. 

00 1010 <- sum of A 
00 1100 <- carry of A 

00 1000 <-sumofB 

00 1000 <- carry of B 

The resultant of "A" plus "B" is represented as: 

01 1110 <- resultant sum of A plus B 
00 1000 <- resultant carry of A plus B 

The maximum number that can be represented by a four bit carry-save 
number is thirty one. The resultant of A plus B in this example is equal to thirty 
eight, so overflow exists. Overflow detection circuit 1000 correctly detects the 
overflow condition because SI and SO are different. 

Post-Normalization 



Figure 1 1 shows a post-normalization circuit. Post-normalization circuit 250 
includes sign detection circuit 1 104, negation circuit 1 102, multiplexor 1106, leading 
zero detector (LZD) 1110, carry propagate adder (CPA) 1108, shifters 1120 and 
25 1 150, and subtracters 1 130 and 1 140. Post-normalization circuit 250 receives the 
mantissa of the sum, M^, and the exponent of the sum, E sum , generates the sign of 
the result, S result , and converts the carry-save number into IEEE standard single 
precision format. 

In some embodiments, one or more circuits within post-normalization circuit 
30 250 is responsive to the enable signal on node 254 (Figure 2). Each circuit is put into 
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a low power state or completely powered down as a function of the state of the 
enable signal Because some portions of post-normalization circuit 250 are outside 
the feedback loop, those portions only need to be turned on after the accumulation is 
complete. For example, when computing X(a f bj over 256 different values of i, 
5 much of post-normalization circuit 250 can be turned off for the first 255 

accumulations, and only turned on for the 256 th accumulation, thereby saving power. 
The invention is not limited by the mechanism used to limit the power usage as a 
function of the enable signal. Example mechanisms include: controlling the reset of 
sequential circuits; and controlling series transistors of the type commonly used to 

10 limit leakage currents. One skilled in the art will recognize that many possible 

mechanisms exist for limiting power consumption as a function of the enable signal. 

M sum is received by sign detection circuit 1 104, negation circuit 1 102, and 
multiplexor 1 1 06. Sign detection circuit 1 1 04 receives and the magnitude 
compare (MC) signal produced by magnitude comparator 325 (Figure 3), and 

15 produces S sum , the sign of the sum. S sum is fedback to magnitude comparator 325 as 
S ft . The operation of sign detection circuit 1 104 and magnitude comparator 325 is 
described in more detail below with reference to Figure 12. Multiplexor 1 106 selects 
between M sum and a negated version thereof based on the sign of the sum, S sum . This 
assures that the resulting mantissa is unsigned. Negation circuit 1 102 can be a 

20 negation circuit such as that shown in Figure 7. 

CPA 1 108 receives the mantissa in carry-save format and converts it to a 
binary number. Carry propagate adders are well known in the art. For an example of 
a carry propagate adder, see the Goto reference cited above with reference to Figure 
4. LZD 1110 detects the number of leading zeros in the mantissa, and provides that 

25 information to subtracter 1130 and shifter 1 120. For a discussion of leading zero 
detectors, see Kyung T. Lee and Kevin J. Nowka, "1 GHz Leading Zero Anticipator 
Using Independent Sign-Bit Determination Logic," 2000 IEEE Symposium on VLSI 
Circuits Digest of Technical Papers, pgs 194-195. Subtracter 1130 subtracts the 
number of leading zeros from the exponent, and shifter 1 120 shifts the mantissa left 
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to remove the leading zeros. The exponent and mantissa are then converted to IEEE 
single precision format by subtracter 1 140 and shifter 1 150. 

Figure 12 shows a sign detection circuit and a magnitude comparator. 
Magnitude comparator 325 is the same magnitude comparator shown in Figure 3. It 
5 is shown in more detail here to illustrate the combined operation of magnitude 

comparator 325 and sign detection circuit 1 104. Magnitude comparator 325 includes 
subtractorl210 and multiplexer 1220. Subtracter 1210 controls multiplexer 1220 
such that MC is equal to the sign of the larger and M ft . For example, when 
is larger than M^, MC is equal to S^. Likewise, when M^ is larger than M^, MC is 
10 equal to S fc . Sign detection circuit 1 104 receives MC and also receives the most 
significant bits of the sum and carry of M sum , labeled SI and CI, respectively. Sign 
detection circuit 1 104 includes logic that generates a sign bit in accordance with the 
following truth table, where "X" signifies either a 1 or a 0, and indicates an 
impossible case. 



si 


CI 


MC 


Sign 


0 


0 


X 


0 


0 


1 


X 


1 


1 


0 


0 


0 


1 


0 


1 


1 


1 


1 


X 





Magnitude comparator 325 operates in parallel with adder mantissa path 324, 
so MC is available for sign detection circuit 1 104 at substantially the same time as 
25 Msum. In this manner, the operation of sign detection circuit 1 104 does not 
appreciably increase the delay within the feedback loop. 

Conclusion 

The method and apparatus of the present invention provide a fast multiply- 
30 accumulate operation that can be made compliant with any floating point format. 
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Furthermore, the method and apparatus of the present invention can provide 
precision comparable to the precision available using prior art double precision 
arithmetic units, in part because the mantissa fields are expanded. In some 
embodiments, IEEE standard single precision operands are multiplied and the 
5 products are summed. The multiplier includes a compressor tree to generate a 
product with a binary exponent and a mantissa in carry-save format. The product is 
converted into a number having a three bit exponent and a fifty-seven bit mantissa in 
carry-save format for accumulation. An adder circuit accumulates the converted 
products in carry-save format. Because the products being summed are in carry-save 

1 0 format, post-normalization is avoided within the adder feedback loop. In addition, 
because the adder operates on floating point number representations having 
exponents with a least significant bit weight of thirty-two, exponent comparisons 
within the adder exponent path are fast, and variable shifters can be avoided in the 
adder mantissa path. When the adder is not pipelined, a fast single cycle 

1 5 accumulation is realized with the method and apparatus of the present invention. 

It is to be understood that the above description is intended to be illustrative, 
and not restrictive. Many other embodiments will be apparent to those of skill in the 
art upon reading and understanding the above description. The scope of the 
invention should, therefore, be determined with reference to the appended claims, 

20 along with the full scope of equivalents to which such claims are entitled. 
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